Lab 1

Part 1: Data Prep

First question: what language(s) are the reviews in?

Simple language guesser using pycld2 library

Add 'lang' column to our dataframe

The values for the 'ratings' column are dictionaries. We need to add these as separate columns using pandas.json_normalize

Now the database is ready to work with!

Part 2: Analysis

Load datafile

Simple language guesser using pycld2 library

Add 'lang' column to our dataframe

Select just the reviews in English

The values for the 'ratings' column are dictionaries. We need to add these as separate columns using pandas.json_normalize

Now the database is ready to work with!

Generate Word Clouds

4 Star Reviews

5 Star Reviews